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Abstract: Motivated by the effectiveness of correlation 
attacks against Tor, the censorship arms race, and ob- 
servations of malicious relays in Tor, we propose that 
Tor users capture their trust in network elements using 
probability distributions over the sets of elements ob- 
served by network adversaries. We present a modular 
system that allows users to efficiently and conveniently 
create such distributions and use them to improve their 
security. To illustrate this system, we present two novel 
types of adversaries. First, we study a powerful, perva- 
sive adversary that can compromise an unknown num- 
ber of Autonomous System organizations, Internet Ex- 
change Point organizations, and Tor relay families. Sec- 
ond, we initiate the study of how an adversary might 
use Mutual Legal Assistance Treaties (MLATs) to enact 
surveillance. As part of this, we identify submarine ca- 
bles as a potential subject of trust and incorporate data 
about these into our MLAT analysis by using them as 
a proxy for adversary power. Finally, we present pre- 
liminary experimental results that show the potential 
for our trust framework to be used by Tor clients and 
services to improve security. 
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1 Introduction 

Tor and its users currently face serious security risks 
from adversaries positioned to observe traffic into and 
out of the Tor network. Large-scale deanonymization 
has recently been shown feasible [22] for a patient ad- 
versary that controls some network infrastructure or Tor 
relays. Such adversaries are a real and growing threat, as 
demonstrated by the ongoing censorship arms race [12] 
and recent observations of malicious Tor relays [32]. In 
light of these and other threats, we propose an approach 
to representing and using trust in order to improve 
anonymous communication in Tor. Trust information 
can be used to inform path selection by Tor users and 
the location of services that will be accessed through 
Tor, in both cases strengthening the protection pro- 
vided by Tor. A better understanding of trust-related 
issues will also inform the future evolution of Tor, both 
the protocol itself and its network infrastructure. Path 
selection and the evolution of the protocol and infras- 
tructure will also be informed by a more comprehensive 
understanding of potential adversaries. 

Attacks on Tor users and services include first-last 
correlation [28] , in which an adversary correlates traffic 
patterns between the client and an entry guard (i.e., a 
relay used by a client to start all connections into Tor) 
with traffic patterns between a Tor exit (i.e., a relay that 
will initiate connections outside the Tor network) and 
a network destination in order to link the client to her 
destination. They also include more recently identified 
attacks on a single end of a path such as fingerprinting 
users [6] or attacking hidden services [4] . With trust in- 
formation, users could choose trusted paths through the 
Tor network and services could choose server locations 
with trusted paths into the network in order to reduce 
the chance of these attacks. 

We propose a modular system that (i) allows users 
to express beliefs about the structure and trustworthi- 
ness of the network, (ii) uses information about the 
network, modified according to the user-provided struc- 
tural information, to produce a “world” that captures 
how compromise is propagated through the network. 
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and (iii) combines this world with the user’s trust beliefs 
to produce a Bayesian Belief Network (BBN; see, e.g., 
[14]) representing a distribution on the sets of network 
elements that an adversary might compromise. The sys- 
tem we describe is designed to produce a distribution 
on the sets of network locations that might be com- 
promised by a single adversary. In the case of multiple, 
non-colluding adversaries, multiple distributions could 
be produced. 

We illustrate how this system might work by intro- 
ducing two novel types of adversaries. First, we consider 
a powerful, pervasive adversary called The Man that 
is potentially observing any independent group of of 
Autonomous Systems (ASes), Internet Exchange Points 
(IXPs), or relay families. The user is uncertain about ex- 
actly what this adversary can observe, but she has some 
information about the risk at different locations. This 
adversary can be seen as a generalization of previous 
threat models in which an adversary might compromise 
relays in the same /16 subnet or family, or in which an 
individual AS or IXP might be malicious. 

Second, we initiate the study of the effects of Mu- 
tual Legal Assistance Treaties (MLATs) on the reach of 
adversaries; we also identify submarine cables as poten- 
tially important subjects of (dis)trust and incorporate 
data about these into our analysis. Here, we demon- 
strate the use of an MLAT database to inform analysis 
of first-last compromise. The randomized state-level ad- 
versaries that we construct for this make use of data on 
submarine cables, opening up that avenue of study in 
connection with anonymity networks. We use existing 
Tor traceroute data to give an initial understanding of 
how MLATs may expand the capabilities of adversaries. 

In addition, we present proof-of-concept experi- 
ments that show how our trust system might be used by 
client or servers to improve their security. We suppose 
that users choose paths and servers choose locations to 
minimize the risk of first-last correlation attacks by The 
Man. The results show that users and services can em- 
ploy our system to improve their security. 

The main part of our modular system was described 
in an unpublished paper [19]. The version presented here 
explicitly accounts for MLATs in the way that we use 
them, a modification that demonstrates the flexibility of 
our system. Our use of the MLAT and cable databases 
and our analysis of the effects of MLATs on the reach of 
adversaries are also new since that preliminary version 
of this work. 

Other work [20, 21] has considered the use of trust 
to improve security in Tor. The models of trust in this 
previous work have the major limitations that they only 


can be used to describe Tor relays and that they assume 
each relay has an independent chance of compromise. 
The framework we present here represents a significant 
advance in that it includes a diverse set of network el- 
ements, including elements such as IP routers or IXPs 
that exist only on the paths between Tor relays. We al- 
low new types of network elements to be added in nat- 
ural ways. Another contribution of our system is that 
it can be used to represent arbitrary probability distri- 
butions over the sets of network elements, and yet we 
show how the most likely distributions can be efficiently 
represented and used. 

The body of this paper provides a high-level view of 
our system, starting with an overview of its operation 
and what the system provides in Sections 2 and 3. We 
describe in Section 4 how the system-provided informa- 
tion is combined with user beliefs to produce a BBN. 
We discuss some issues related to users’ trust beliefs 
in Section 5. We present The Man in Section 6. In Sec- 
tion 7, we discuss MLATs and analyze their implications 
for adversary capabilities; the randomized construction 
of hypothetical adversaries for that analysis is guided 
by countries’ connections to submarine cables. We then 
present, in Section 8, experimental results from our trust 
system. We close in Section 9 with a discussion of the 
implications of the work presented here and a sketch of 
ongoing and future work. As noted throughout, some 
additional details are provided in the appendices. 


2 System Overview 

We survey our system, which is largely modular. This al- 
lows it to be extended as new types of trust information 
are identified as important, etc. The system comes with 
an ontology that describes types of network elements 
(e.g., AS, link, and relay-operator types), the relation- 
ships between them that capture the effects of compro- 
mise by an adversary, and attributes of these things. 
While we provide an ontology, this may be replaced by 
another ontology as other types of threats are identified. 
Section 3.1 describes the requirements for replacement 
ontologies. Roughly speaking, the ontology identifies the 
types of entities for which the system can automatically 
handle user beliefs when constructing the Bayesian Be- 
lief Network (BBN) for the user. A user may express 
beliefs about other types of entities, but she would need 
to provide additional information about how those en- 
tities relate to entities whose types are in the ontology. 
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The ontology is provided to the user in order to facilitate 
this. 

In general, we expect that the system will provide 
information about network relationships, such as which 
ASes and IXPs are on a certain virtual link or which Tor 
relays are in a given relay family. We generally expect 
the user to provide information about human-network 
relationships such as which individual runs a particular 
relay. Note that this means the user might need to pro- 
vide this type of information in order to make some of 
her beliefs usable. For example, if she has a belief about 
the trustworthiness of a relay operator, she would need 
to tell the system which relays that operator runs in 
order for the trustworthiness belief to be incorporated 
into the BBN. 

Using the ontology and various published informa- 
tion about the network, the system creates a preliminary 
“world” populated by real-world instances of the ontol- 
ogy types (e.g., specific ASes and network operators). 
The world also includes relationship instances that re- 
flect which particular type instances are related in ways 
suggested by the ontology. User-provided information 
may include revisions to this system-generated world, 
including the addition of types not included in the pro- 
vided ontology and instances of both ontology-provided 
and user-added types. The user may also enrich the in- 
formation about the effects of compromise (adding, e.g., 
budget constraints or some correlations). 

The user expresses beliefs about the potential for 
compromise of various network entities; these beliefs 
may refer to specific network entities or to entities that 
satisfy some condition, even if the user may not be able 
to effectively determine which entities satisfy the con- 
dition. This user-provided information is used, together 
with the edited world, to create a Bayesian Belief Net- 
work (BBN) that encodes the probability distribution 
on the adversary’s location that arises from the user’s 
trust beliefs. A user may express a belief that refers to 
an entity or class of entities whose type is in the given 
ontology. For such beliefs, the system will be able to au- 
tomatically incorporate those beliefs into the BBN that 
the system constructs. A user may also express beliefs 
about entities whose types are not included in the ontol- 
ogy. If she does so, she would need to provide the system 
with information about how those entities should be put 
into the BBN that the system constructs. 

The system and the user need to agree on the lan- 
guage(s) in which she will express her beliefs. Different 
users (or, more likely, different organizations that want 
to provide collections of beliefs) may find different lan- 
guages most natural for expressing beliefs. The language 


specification(s) must describe not only the syntax for 
the user but also (i) how her structural beliefs will be 
used in modifying the system-generated world and (ii) 
how her other beliefs will be used to translate the edited 
world into a BBN. 

Once constructed, the BBN can be used, e.g., to 
provide samples from the distribution of the Tor re- 
lays and Tor “virtual links” (transport-layer connections 
with Tor relays) that are observed by the adversary. The 
motivating application is to use these samples to inform 
more secure path selection in Tor. 

2.1 Construction sequence 

An overview of the system’s actions is as follows. The 
various attributes and beliefs mentioned here are de- 
scribed in detail in the following sections. 

1. World generation from ontology: rW^ 

- As described in Section 3.3, the system generates 
a preliminary view of the world based on the on- 
tology and its data sources. We denote the result 
by rWt- 

- This includes system attributes. 

2. Augmenting the types with the user’s types: rW^, 

- The user may provide additional types (as a pre- 
lude to adding instances of those types to the 
world). We use rW^, to denote the augmentation 
of rW^ by adding the user’s types. 

3. Adding user-specified instances of types (ontology 
and user-provided): rW^' 

- The user may add instances of any of the types 
in rW^,. We use rW^' to denote the augmenta- 
tion of rWx' by adding these new instances and 
removing any that the user wishes to omit. 

4. Adding user-specified relationships (between in- 
stances in flWf/): RiW^i 

- The user may specify additional parent/child rela- 
tionships beyond those included in rW^,. In par- 
ticular, any new instances that she added in the 
previous step will not be related to any other in- 
stances in the world unless she explicitly adds such 
relationships in this step. We use ij'Wf/ to denote 
the augmentation of rW^, by adding these new 
relationships and by removing any that the user 
wishes to omit. 

5. Edit system-provided attributes (not budgets or com- 
promise effectiveness). 

6. Add new user-provided attributes. 

7. Add budgets. 
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8. Add compromise effectiveness (if values are not given, 
this defaults to a value provided by the ontology; 
for relationships of types not given in the ontology, 
we will use a default value unless the user specifies 
something when providing the relationship instance). 

9. Produce BBN. 

- In this overview, this process is treated as a black 
box. In practice, it involves many steps that de- 
pend on the belief language used. The procedure 
for the belief language described in Sec. 4.2 is pre- 
sented in detail in Sec. 4.3. 


3 Ontology and World 

Before presenting the ontology that we use in this work, 
we describe our general requirements for ontologies in 
this framework. This allows our ontology to be replaced 
with an updated version satisfying these requirements. 

3.1 General requirements for ontologies 

We assume that any ontology used in our system has 
the following properties: 

- It has a collection T of types. We use the ontology to 
describe relationships between the types in the on- 
tology. 

- A collection £ of (directed) edges between types (with 
£ nT =0). The edges are used to specify relation- 
ships; if there is an edge from Ti to T 2 in the ontology, 
then the compromise of a network element of type Ti 
has the potential to affect the compromise of a net- 
work element of type T 2 . 

- Viewed as a directed graph, {T,£) is a DAG. 

- A distinguished set of T called the output types. This 
is for convenience; these are the types of instances 
that we expect will be sampled for further use. We 
generally expect the output types to be exactly the 
types in the ontology that have no outgoing edges. 

- Each element of U f has a label that is either “sys- 
tem” or “user.” For an edge e from type Ti to type 
T 2 , if either Ti or T 2 has the label “user,” then e 
must also have the label “user.” These labels will be 
used to indicate the default source of instances of 
each type. (However, the user may always override 
system- provided information.) 

Types or edges with the label “user” might be nat- 
ural to include in an ontology when the type/edge 
is something about which the system cannot reliably 


obtain information but the ontology designer is able 
to account for instances of the edge/type in the BBN- 
construction procedure. 

- A collection A of attributes. Each attribute includes 
a name, a data type, a source (either “system” or 
“user”). Each element of 7~U£ may be assigned mul- 
tiple boolean combinations of attributes; each com- 
bination is labeled with either “required” or “op- 
tional.”^ 

Other ontologies may modularly replace the one de- 
scribed here if they satisfy the assumptions described 
above. 

3.2 Our ontology 

Figure 1 shows the elements of our ontology. Rounded 
rectangles are types; instances of these will be factor 
variables in the BBN produced by the system. Ovals 
are output types: Tor relays and (virtual) links be- 
tween clients and guards and between exits and destina- 
tions. Cylinders are attributes, whose interpretation is 
described below. With the exception of Relay Software 
and Physical Location, which the system provides but 
the user may modify, these attributes are provided by 
the user. The user may also provide new attributes. 

Directed edges show expected relationships between 
types. For example, the edge from the “AS” type to 
the “Router/switch” type indicates that we expect that 
the compromise of an AS will likely contribute to the 
compromise of one or more routers and switches. This 
edge is dashed in Fig. 1 to reflect the label “user,” i.e., 
we currently expect the user to identify which AS con- 
trols a particular router or switch if that effect is to be 
incorporated into the BBN construction. Other dashed 
edges and the unfilled types/attributes are also elements 
that we expect to be provided by the user. Solid edges 
and filled-in types correspond to elements and attributes 
whose label is “system;” we expect the system provide 
information about these. 


1 In the rest of this paper, we assume that each combination is 
just a single “optional” attribute without any connectives. The 
semantics of individual attributes depend on the translation pro- 
cedure that produces the BBN. We expect that a boolean combi- 
nation of attributes will be interpreted as possible combinations 
of attributes that the translation procedure can handle; for ex- 
ample, it might be able to process either a pair of integers or a 
single real value. Richer applications of the “optional” and “re- 
quired” labels might be allowed as well, although we do not need 
them here. 
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Fig. 1. G raphical depiction of the system’s ontology 

3.2.1 User-provided types 

The types and relationships that are provided by the 
system in constructing the preliminary world are de- 
scribed in Section 3.3. We describe the others here; in- 
stances of these are added by the user in ways specified 
below. 

Hosting Service (and incident edges) Hosting 
services that might be used to host Tor relays. If 
a service hosts a particular relay, there would be a 
relationship instance from the service to the relay. If 
a service is known to be under control of a partic- 
ular legal jurisdiction or company, the appropriate 
incoming relationship instance can be added. 
Corporation (and incident edges) Corporate con- 
trol of various network elements may be known. A 
corporation that is known may be added as an in- 
stance of this type. If the corporation is known to 
be subject to a particular legal jurisdiction, then a 
relationship edge from that jurisdiction to the cor- 
poration can be added. Similarly, hosting services, 
ASes, and IXPs that a corporation controls may be 
so indicate via the appropriate relationship instances. 
Router /switch/etc. This corresponds to a physical 
router or switch. We do not attempt to identify these 
automatically, but ones known to the user (or a 


source to which the user has access) may be added 
as instances of this type. 

Physical connection Particular physical connec- 
tions, such as a specific cable or wireless link, may 
be known and of interest. 

(Physical connection, Virtual link) If a virtual 
link is known to use a specific physical connection, 
then that can be reflected in a relationship between 
the two. 

3.2.2 Attributes 

The attributes in our ontology are depicted by cylinders 
in Fig. 1. The two at the box in the top right can be 
applied to all non-output type instances, so we do not 
explicitly show all of the types to which they can be 
applied. 

System-generated attributes These include relay- 
software type and router/switch type. Users may edit 
these, e.g., to provide additional information. 
Connection type This is an attribute of physical- 
connection instances. It is represented as a string 
that describes the type of connection (e.g., 
"submarine cable", "buried cable", or "wireless 
connection"). A user would express beliefs about 
connection types; if the type of a connection is cov- 
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ered by the user’s beliefs, then the probability of com- 
promise would be affected in a way determined by the 
belief in question. 

Budget This attribute, which is supplied by the user at 
her option, may be applied to any non-output type in- 
stance. There are two variants. Both are represented 
as an integer k and another value. In the first variant, 
the other value is a type; in the second variant, the 
other value is the string "all". Multiple instances 
of this attribute may be applied to a single type in- 
stance as long as they have distinct second values; if 
one of these is the second variant, then all others will 
be ignored. This allows the user to express the belief 
that, if the type instance is compromised, then its re- 
sources allow it to compromise k of its children. In the 
first variant of this attribute, the instance may com- 
promise k of its children of the specified type (and 
perhaps k' of its children of a different type, if so 
specified by a different belief). In the second variant 
of this attribute, the instance may compromise k of 
its children across all types. ^ 

As discussed below, we must approximate the effects 
of resource constraints so that the BBN can be effi- 
ciently sampled. 

Region This is an attribute of legal jurisdiction. It is 
represented as a boolean predicate on geographic co- 
ordinates. 

Compromise effectiveness This attribute is syntac- 
tically similar to the budget attribute. It is supplied 
by the user at her option for instances of any non- 
output type, and there are effectively two variants. 
This is represented as a probability p G [0, 1] and a 
boolean predicate on type instances; we distinguish 
non-trivial predicates from the always-true predicate 
T. Multiple instances of this attribute may be applied 
to a single type instance as long as no two non-T 
predicates evaluate to True on the same input. Only 
one instance of this attribute with T may be present; 
if it is, then all other instances of the attribute for 
the type instance are ignored. 

This attribute allows the user to express beliefs about 
the effect of compromise of one type instance on its 
children, either uniformly or according to type. For 
example, a compromised AS might attempt to com- 
promise all of its routers; with some probability (e.g.. 


2 The resources needed to compromise instances of different 
types may vary widely. However, we include the second variant 
so that a budget that covers all of an instance’s children can be 
modeled in some fashion. 


p = 10”^), it might make a mistake in the configura- 
tion file for a certain router model that would prevent 
it from compromising routers of that model that are 
not otherwise compromised. However, if such a mis- 
take is not made, then the AS will compromise all 
routers of that model; this is in contrast to the ef- 
fects of budget beliefs. 

Router/ Switch Kind This is an attribute of 
routers/switches and is represented as a set of strings. 
We expect the user to use this to describe aspects 
of routers/switches that she might know about and 
want to use in her trust beliefs, e.g., the model 
number or firmware version of specific routers and 
switches. 

Relay Hardware This is an attribute of relays and 
is represented in the same way as the router/switch 
kind. Also analogously to that attribute, we expect 
that the user would use this to describe aspects of 
relay hardware that she might know about and po- 
tentially use in her trust beliefs. 

3.3 System-generated world 

The system provides users with a world consisting of 
type instances and relationship instances that are con- 
sistent with the types and relationships specified in the 
ontology. Formally, a world is a DAG in which each 
vertex is a type instance, each edge is a relationship 
instance, and an attribute function assigns each ver- 
tex a vector of attributes. A type instance represents 
a real-world object of the specified type. For example, 
“AS3356” is a type instance of the AS type, and “Level 
3 Communications” is a type instance of the AS Orga- 
nization type. A relationship instance will only relate 
two instances of types that are related in the ontology. 
For example, (Level 3 Communications, AS3356) is an 
instance of the (AS Organization, AS) relationship type 
and indicates that AS3356 is a member of Level 3 Com- 
munications. The attributes of a type instance provide 
information that users can incorporate into their trust 
beliefs, such as the location of a given Tor relay. The 
world can be modified by users in ways provided by 
the trust language. We assume that each instance has a 
unique identifier and an indication of the type of which 
it is an instance. 

For our ontology, the system generates a world as 
follows: 

1. The current Tor consensus and the server descriptors 
it references are used to create the following instances 
and attributes, which concern relays: 
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- Tor Relay: An instance is created for each relay 
in the consensus. 

- Relay Family: An instance is created for each 
connected component of relays, where two relays 
are connected if they mutually reference each other 
in the family section of their descriptors [11]. 

- (Relay Family, Tor Relay): An instance of this 
relationship is created for each relay belonging to 
a given family. 

- Relay Software Type: This attribute is added to 
each relay based on the operating system reported 
in the relay’s descriptor. 

2. Standard techniques [22] are used to construct an AS- 
level Internet routing map. Data that can be used 
to create such a map includes the CAIDA internet 
topology [8], the CAIDA AS relationships [7], and 
RouteViews [31]. This map is then used to create the 
following instances: 

- Virtual Link: An instance is created represent- 
ing the path between each Autonomous System 
and possible guard as well as between each Au- 
tonomous System and exit. A possible guard is a 
Tor relay that satisfies the requirements to serve 
as an entry guard. Guards and exits are deter- 
mined from the Tor consensus. A virtual-link in- 
stance represents both directed paths between the 
Autonomous System and relay, which may differ 
due to Internet route asymmetries [15]. 

- AS: An instance is created for each AS observed 
in the RouteViews data. 

- (AS, Virtual Link): An instance of this relation- 
ship is created for each AS that appears on the 
path in either direction between the virtual link’s 
AS and its relay, as determined by the Internet 
routing map. 

3. Internet Exchange Points (IXPs) are added to paths 
in the AS-level Internet map based on data from the 
IXP Mapping Project [3]. These additions are used 
to create the following instances: 

- IXP: An instance is created for each IXP that 
appears on at least one path in the Internet map. 

- (IXP, Virtual Link): An instance of this rela- 
tionship is created for each IXP that appears on 
the path in either direction between the virtual 
link’s AS and its relay, as determined by the In- 
ternet routing map. 

4. ASes are clustered into organizations using the re- 
sults of Cai et al. [5], and IXPs are clustered into 
organizations using the results of Johnson et al. [22]. 
Each cluster represents a single legal entity that con- 


trols multiple ASes or IXPs, such as a company. The 
clusters are used to create the following instances: 

- AS Organization: An instance is created for each 
AS cluster. 

- IXP Organization: An instance is created for 
each IXP cluster. 

- (AS Organization, AS): An instance of this re- 
lationship is created for each AS in a given AS 
cluster. 

(IXP Organization, IXP) : An instance of this 
relationship is created for each IXP in a given IXP 
cluster. 

5. The system provides physical locations and legal ju- 
risdictions for several of the ontology types. IP loca- 
tion information, such as from the MaxMind GeoIP 
database [26] , provides location information for enti- 
ties with IP addresses. The location of IXPs is fre- 
quently available on the Web as well [3]. The bilat- 
eral MLATs that might apply are obtained from the 
MLAT.is database [9]. These data are used to create 
the following instances and attributes: 

- Legal jurisdiction: An instance of this type is 
created for each country. 

- (Legal jurisdiction. Relay): An instance of this 
relationship is created for each relay in a given 
country, as determined by the relay’s IP address 
and the IP location information. 

- (Legal jurisdiction, IXP): An instance of this 
relationship is created for each IXP in a given 
country, as determined by the IP addresses of the 
IXP or other public IXP information. 

- Physical location: This attribute is added to 
each relay with its geographic coordinates (i.e., 
latitude and longitude), as determined from its IP 
address. This attribute is also added to each IXP 
with its geographic coordinates, based on its IP 
addresses or other public IXP information. 

- MLAT: As a preliminary step, for each country 
instance G, create a duplicate country instance C\ 
and add a relationship instance from C' to C. Eor 
each in-force, bilateral MLAT, create an MLAT in- 
stance (we assume there is at most one per pair of 
countries). Eor each MLAT instance M, if Ci and 
C 2 are the instances of the two countries involved 
in the corresponding MLAT, add relationship in- 
stances from C[ and C 2 to M and from M to Ci 
and G 2 . The duplicate C' instances will be the ini- 
tially compromised ones. The structure described 
here will propagate this compromise to the origi- 
nal C instances, either directly or through MLATs. 
Here, we take the default effectiveness to be 1, i.e.. 
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each country always compromises its MLAT part- 
ners, but this may be changed by the user on a 
per-MLAT basis. 

6. Although the system does not provide information 
about physical connections in general, it can use a ca- 
ble database such as the TeleGeography database [29] 
or Greg’s Gable Map [25] to add a cable instance for 
each cable in the database. It would still be left to 
the user to identify which virtual links use which ca- 
bles, although incorporating this into the system is a 
topic of ongoing work. 


4 Beliefs and BBNs 

The user may provide various data to inform the oper- 
ation of the system. However, many users may not wish 
to do this, and the system includes a default belief set 
designed to provide good security for average users. In 
Section 6 we describe a possible default belief set. For 
simplicity, we refer to beliefs as being provided by the 
user, but wherever they are not, the defaults are used 
instead. 

4.1 User beliefs 

Broadly, users may have two kinds of beliefs: those 
about the structure of the network, etc., and those about 
trust. The user’s structural beliefs are used to edit the 
system-generated world to produce an “edited world;” 
we expect this will be done once, not on a per-adversary 
basis. These beliefs may describe new types and the ad- 
dition or removal of type instances and relationships be- 
tween them (e.g., adding relay operators known to the 
user). The user may also define new attributes, change 
the system-provided attributes, or provide values for 
empty attributes (e.g., labeling countries by their larger 
geographic region). 

The user’s beliefs may incorporate boolean pred- 
icates that are evaluated on instances in the revised 
world. For example, the user may have increased trust 
in ASes above a certain size. We sketch a suitable lan- 
guage for this in App. A, but this can be replaced with 
another if desired. 

A user may have structural beliefs about instances 
of types and edges from the ontology. For types, a user 
may believe that an instance of that type exists; her be- 
lief about that instance must include a unique identifier 
for the instance and any required attributes. This type 


instance is then added to the system-generated world. 
The type of the instance may be system-generated, in 
which case this belief represents an edit to the system- 
generated world, or it may be user-generated. If the in- 
stance’s type is user-generated, then the user must de- 
scribe to the system how the instance should be trans- 
lated to the BBN that the system produces from the 
edited world. 

For edges, a user may believe that one type in- 
stance is the parent of another type instance. Her belief 
about such a relationship must include any required at- 
tributes of the corresponding edge type in the ontology. 
This relationship instance is then added to the system- 
generated world. If the edge type is not part of the on- 
tology, the user must describe how the edge affects the 
computation of values in the BBN that the system pro- 
duces. 

Finally, the user provides trust beliefs of four types 
that are used in constructing the BBN from the revised 
world. The first two types of trust beliefs concern the 
propagation of compromise. Budget beliefs allow the 
user to say that an instance I in the edited world has 
the resources (monetary or otherwise) to compromise 
k of its children that satisfy some predicate P. Enforc- 
ing this as a hard bound appears to be computationally 
harder than we are willing to use in the BBN, so we do 
this in expectation. Compromise-effectiveness (CE) be- 
liefs allow the user to express some correlations between 
the compromises of nodes by saying that, if an instance 
I is compromised, then, with probability p, all of /’s 
children satisfying a predicate P are compromised. Eor 
example, this captures the possibility that a compro- 
mised AS compromises all of its routers except those of 
a particular model, for which the AS has made an error 
in their (common) configuration file. 

The other two types of trust beliefs concern the like- 
lihood of compromise. Relative beliefs allow the user to 
say that instances satisfying a given predicate (e.g., re- 
lays running a buggy OS, network links that traverse a 
submarine cable, or ASes that are small as determined 
by their number of routers) have a certain probability 
of compromise. (In particular, it specifies the probabil- 
ity that they remain uncompromised if they are other- 
wise uncompromised.) Absolute beliefs allow the user 
to say that instances satisfying a given predicate (e.g., 
the node is an AS and the AS number is 7007) are com- 
promised with a certain probability, regardless of other 
factors. 
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4.2 Sample belief language 

We now describe a sample language for users’ struc- 
tural and trust beliefs. This incorporates predicates, 
which might be expressed using the predicate language 
just outlined. In general, we assume that there is a 
set V of values that the user may use to express lev- 
els of trust. We illustrate this here by taking V to be 
{SC,LC,U,LT,ST}; we think of these as “Surely Com- 
promised,” “Likely Compromised,” “Unknown,” “Likely 
Trustworthy,” and “Surely Trustworthy.” Our examples 
will not rely on V having exactly five elements, but we 
think this is one natural way that users might think 
about their trust in network elements. 

4.2.1 Structural beliefs 

Let TZ be the set of relationship instances in the system- 
created world. TZ' will be TZ augmented with all of the 
user-specified relationships. 

Novel types A user may define new types via expres- 
sions of the form {"ut” ,tname, structreq, structopt), 
where ”ut” is a string literal, tname is a string (the 
name of the type) that must be distinct from all other 
tname values the user specifies and from all elements 
of T, and where structreq and structopt are both de- 
scriptions of data structures (these may be empty 
data structures, which might be indicated by NULL). 
We write T' for the set containing the elements of 
T together with all of the tname values provided by 
the user. 

Type instances An ordered list of tuples {T,D,n) 
T G £> is a data structure that is valid for T, 
and n is a unique identifier among these tuples.^ 

We write I' for the set formed by augmenting I with 
these new instances. 

Relationship instances A set of pairs {P,C), where 
P (parent) and C (child) are type instances from X' 
We do not need to specify new relationship types, 
only the additional relationship instances. 


3 We assume that the system provides unique identifiers for the 
system-generated type instances and that the values of n in the 
user’s list of tuples are distinct from those identifiers. 

4 We abuse notation and use P and C in place of the unique 
identifiers associated with each type instance in the edited world. 


4.2.2 Trust beliefs 

Relative beliefs These are beliefs of the form (s, P, v), 
where s is a string other than ”abs”, P is a predicate 
on factor variables, and v E V. 

Note that, in our translation procedure below, rel- 
ative beliefs affect the probability of compromise of 
a factor variable in the BBN that is not otherwise 
compromised through the causal relationships cap- 
tured in the world. 

Absolute beliefs These are beliefs of the form 
("abs", P,u), where P is a predicate on factor vari- 
ables and V E V. A belief such as this says that the 
chance a variable satisfying P is compromised is cap- 
tured by V. Note that it is the user’s responsibility 
to ensure that no two different absolute beliefs have 
predicates that are simultaneously satisfied by a node 
if those beliefs have different values for v. We do not 
specify what value is used if this assumption is vio- 
lated.® 

Budget Expressed as either ("bul", /, T, fc) or (^'bu2", 
/, T,fc), where "bul" and "bul” are string literals, I 
is a type instance in the edited world, T is a type 
in the edited world, and k is an integer. The inter- 
pretation is that, in expectation, compromise of the 
type instance with a Budget attribute will lead to 
compromise of k of its children (of type T in the first 
variant, or of all its children in the second variant). 

Compromise effectiveness Expressed as either 
( cel ,/, 

Pce,u) or ("ce2", /, T, u), where "cel" and "ce2" 
are string literals, I is an instance of a non-output 
type in the edited world. Pee is a predicate on in- 
stances of a fixed type, T is a distinguished symbol, 
and V E V. The interpretation is that, if instance 
/ is compromised, then it compromises its children 
satisfying Pee (or all children, if T is given) with 
probability corresponding to v. 

The actual probabilities that a compromised net- 
work element compromises other elements it controls, 
which CE beliefs attempt to capture, may tend to fall 
in a different range than other probabilities of com- 
promise. Our translation procedure could be modi- 
fied to treat the value u in a CE belief as a different 
probability than is used for other types of beliefs. 
Similarly, the belief language could be modified to 


5 A natural approach is to allow the use to specify these in an 
ordered list and using the last satisfied predicate. 
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allow CE beliefs to include probability instead of a 
value from V. 

4.2.3 Five-valued example 

The following examples of beliefs illustrate how a user 
might express her beliefs in our five- valued example lan- 
guage. 

1. Countries in set Si are likely trustworthy. 

2. Countries in set S '2 are likely compromised. 

3. Countries in set S 3 are surely compromised. 

4. AMS-IX points are likely trustworthy. 

5. MSK-IX points are of unknown trustworthiness. 

6. Relay family Fi is likely compromised. 

7. Relay family F 2 is surely uncompromised. 

8. Relay operator Oi is surely uncompromised. 

9. Relay operator O 2 is likely uncompromised. 

10. Hosting company Hi is surely trustworthy. 

11. Submarine cables are of unknown trustworthiness. 

12. Wireless connections are likely compromised. 

13. Relays running Windows are of uknown trustworthi- 
ness (the system gets OS information from relay de- 
scriptors). 

14. If an AS is compromised, then it is expected to be 
able to compromise 4 of the finks that it is on. 

We suggest that the compromise probabilities corre- 
sponding to the values SC, LC, U, LT, and ST might be 
taken by the sytem to be 0.999, 0.85, 0.5, 0.15, and 0.02, 
respectively. However, the user would express her beliefs 
in terms of “surely compromised,” etc., as above. What- 
ever language is used to express beliefs, there would need 
to be an appropriate interface for users to express or im- 
port beliefs. 

4.3 Translations to BBNs 

A translation procedure in general needs to take the 
edited world (reflecting the structural beliefs and at- 
tribute values provided by the user) and the user’s trust 
beliefs as input and produce a BBN as output. The out- 
put variables of the BBN should match the nodes in the 
edited world that are instances of types designated as 
output types in the ontology or the user’s structural be- 
liefs. Here, we present a translation procedure that fits 
with the rest of the system we describe (it matches our 
particular ontology, etc.). 

As a component of our system, BBNs have both 
strengths and weaknesses. Their general strengths of 
being concise, being efficiently sampleable, and allow- 


ing computation of other properties of the distribution 
(e.g., marginal probabilities and maximum likelihood 
values) are beneficial in our system. BBNs are espe- 
cially well-suited to our approach here because of the 
close structural similarity between our revised worlds 
and the BBNs we construct from these. 

As a disadvantage, BBNs do not represent hard 
resource constraints efficiently; we can only approxi- 
mate those here by constraining resources in expecta- 
tion. More generally, other negative correlations may 
be difficult at best to capture, but it is possible that 
users will hold beliefs that imply negative correlations 
between compromise probabilities. 

The purpose of this system is to produce an effi- 
ciently sampleable representation of compromise proba- 
bilities. Other representations of distributions could also 
be used, but they might be most naturally generated 
from trust beliefs in different ways. A detailed discus- 
sion of such approaches is beyond the scope of this work. 

4.3.1 Our translation procedure 

We now describe a translation procedure for the ontol- 
ogy and beliefs that we have presented above. Let W' 
be the final world that appears in the construction se- 
quence described above. 

- For each node (type instance) in W\ the BBN con- 
tains a corresponding factor variable/node. We refer 
to the BBN node by the same name as the node in 
W'. 

- For each compromise-effectiveness belief B = 
{s,n, P,u) about a node n, there is a corresponding 
child vb of n in the BBN. The table for us is such 
that, if n is uncompromised, then Ub is uncompro- 
mised; if n is compromised, then ub is compromised 
with probability p{v) and uncompromised otherwise. 
(We use p{v) to denote the probability value that the 
system assigns to the value u G V that is part of the 
user’s belief language.) The children of vb in the 
BBN are the nodes in the BBN that correspond to 
nodes in W' that (1) are children of n and (2) satisfy 
the predicate P from the belief B. Assign these edges 
the weight set {!}; this auxiliary information will be 
used to construct the BBN’s probability tables. 

If there are children of n in W' that do not satisfy 
any of the predicates in the compromise-effectiveness 
beliefs about n (including, e.g., when the user has 
no compromise-effectiveness beliefs), then make these 
nodes children of n in the BBN. Assign to each of 
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these edges the singleton weight set whose element is 
the appropriate default probability.® 

- For each budget belief B = (s,n, P,k) about a node 
n, let c„,p be the number of children of n (in W') that 
satisfy P. For each of these children, in the BBN, 
replace the single value in the edge’s weight set by 
that value multiplied by fc/c„,p. 

- Assign to each non-CE-belief node n a “risk set” Rn 
that is initially empty. We add to Rn values that 
describe additional risk of n being compromised: For 
each belief B = (s, P,u) that has not already been 
evaluated and whose initial entry is not ”abs^^, if n 
satisfies P, then add v to Rn (retaining duplicates, so 
that Rn is a multiset). 

- Construct the tables for each non-CE node in the 
BBN. (We have already constructed the tables for 
the CE-belief nodes.) Let n be a non-CE node. Eor 
each subset S of n’s parents, if S is the multiset of 
weights on the edges from nodes in S to n, and if R 
is the multiset of risk weights associated with n, then 
the probability that n is compromised given that its 
set of compromised parents is exactly S is: 

1- (lI(l-P)) • 

VpeS / \qeR / 

Note that, if the user has no parents, then the first 
product will be empty (taking a value of 1), and the 
probability of compromise will be determined solely 
by the risk factors unless the user expresses beliefs 
that override these. 

- If the user provides a belief B = ("abs", P,u), then 
nodes satisfying P are disconnected from their par- 
ents. Their compromise tables are then set so that 
they are compromised with probability p{v) and un- 
compromised with probability 1 —p{v). This allows a 
user to express absolute beliefs about factor variables 
in the BBN (hence “abs”). In particular, she may ex- 
press beliefs about input variables whose compromise 
would otherwise be determined by their attributes. 


6 We assume that there are default values — perhaps just a sin- 
gle, common one — for the probability that the compromise of a 
node leads to the compromise of its children. These values might 
naturally depend on the types involved. Here, we suggest 1 as a 
common default value. 


4.3.2 Potential extensions 

We assume that adversaries are acting independently, 
although this may not always be the case. One natu- 
ral example of inter-adversary dependence occurs with 
the compromise of resource-constrained instances in the 
world. Eor example, an ISP’s resources may limit it to 
monitoring k of its routers. If both the ISP and the coun- 
try (or other legal jurisdiction) controlling it are a user’s 
adversaries, then they should compromise the same set 
of the ISP’s routers. (This is true whether we model 
this compromise probabilistically, with k routers com- 
promised in expectation, or through some other means.) 
This might be modeled statically by changing the struc- 
ture of the BBN, but dynamic compromise and more 
general inter-adversary dependence may require other 
approaches. 

At this point, our system does not include instances 
in the world in constructs that correspond to cities or 
states/provinces. These are most naturally viewed as 
instances of legal jurisdictions, and the user may well 
have beliefs about the corresponding laws or enforce- 
ment regimes. One way that we envision the user may 
address these is by adding to the world instances of legal 
jurisdictions that carry a “Boundary” attribute, effec- 
tively a predicate that can be evaluated on the system- 
provided geolocation data. The system could then de- 
termine which network entities are in which of these 
user-supplied jurisdictions. Physical locations might be 
handled this way as well, as long as the location is “large 
enough” relative to the resolution of the geolocation pro- 
cess. 


5 Trust 

We now discuss where trust judgments might originate. 
Eirst, we present the rationale behind a trust policy that 
might be distributed with Tor client software as a de- 
fault. Such a policy would be designed not to offer the 
best protection to particular classes of users but to ade- 
quately protect most Tor users regardless of where they 
are connecting to the network or what their destinations 
and behaviors are. Second, we discuss other sources of 
trust information and some use cases. 

The most useful information about Tor relays 
for setting a default level of trust is probably relay 
longevity. Running a relay in order to observe traffic 
at some future time or for persistent observation of all 
traffic requires a significant investment of money and 
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possibly official authorization approval. This is all the 
more true if the relay contributes significant persistent 
capacity to the network. Further, operators of such re- 
lays are typically more experienced in many senses and 
thus somewhat less open to external compromise via 
hacking. The amount of relay trust is thus usefully tied 
to the length of presence in the network consensus, up- 
time, and bandwidth. This approach does not resist ar- 
bitrary large-budget, nation-state-scale adversaries with 
authority to monitor relays persistently, but it will help 
limit attacks to adversaries with such persistent capa- 
bilities and intentions. Resistance to particular nation- 
state adversaries would not make sense as a default trust 
policy for all Tor users worldwide. 

There is no general reason to trust one AS, IXP, 
etc., more than another, but one should not presume 
that they are all completely safe. It thus is reasonable 
to assume the same moderate risk of compromise for 
all elements forming the links to the Tor network and 
between the relays of the network when creating a de- 
fault trust policy. Though uniformly distributed, trust 
in these elements still plays a role in route selection. 
For example, a very high uniform level of trust would 
permit selection of routes through the same IXP if the 
trust in the selected relays themselves were found to be 
adequate. A lower level of trust might dictate a selec- 
tion despite the availability of higher- longevity relays 
because of the AS or IXP risk. Note that moderating 
AS and IXP trust can also mitigate persistent nation- 
state adversaries to some extent if we assume individual 
ASes and IXPs are more likely to be compromised by 
the countries in which they are located. 

Note that the average client using a default trust 
policy may be subject to errors because the average 
client’s beliefs will rarely be exactly at the default. For 
any policy a client uses, the client may be subject to 
errors in the judgments that underly the policy. 

Users with particular concerns might use non- 
default beliefs. These could be provided by, e.g., gov- 
ernment entities, privacy organizations, political groups, 
media organizations, or organizations defending abuse 
victims. An example of an important non-default case 
is connecting users to sensitive destinations that they es- 
pecially do not want linked to their location or possibly 
to their other Tor behaviors. For example, some users 
need to connect to sensitive employer hosts, and dissi- 
dent bloggers could be physically at risk if seen posting 
to controversial sites. These users may have rich trust 
beliefs (either of their own or supplied by their orga- 
nizations) about particular relays, ASes, etc., based on 
who runs the relay, hardware, location, etc. 


6 Modeling a Network Adversary 

We illustrate the use of our trust framework by consid- 
ering a powerful, pervasive adversary called The Man. 
This adversary follows the suggestions in Sec. 5 and thus 
is a plausible candidate for a default trust belief in Tor. 
We construct The Man by drawing on a variety of public 
data sets and evaluate its ability to compromise users’ 
paths. 

6.1 Constructing The Man 

We allow The Man to compromise relay families and AS 
or IXP organizations, where a family or organization is 
a group controlled by the same entity. Each family is 
compromised by The Man independently with proba- 
bility between 0.001 and 0.1, where the probability in- 
creases as the family’s longevity in Tor decreases. Specif- 
ically, the probability of compromise for a family / with 
uptime Uf was set to be (0.1 — (0.1 — 0.001))u/. Each 
AS and IXP organization is compromised independently 
with probability 0.1. 

To construct The Man adversary, we must create a 
routing map of the Internet that includes ASes, IXPs, 
and Tor relays. We must also group ASes and IXPs into 
organizations, identify relay families, and evaluate the 
longevity of Tor relays. We do so using the techniques 
and data sources described in Section 3.3. 

To build the routing map, we use CAIDA topol- 
ogy and link data from 12/13 and RouteViews data 
from 12/1/13. The resulting map includes 46,368 ASes, 
279,841 links between ASes, and 240,442 relationship 
labels. To group ASes by the organization that controls 
them, we use the results of Cai et al. [5]. These contain 
data about 33,824 of the ASes in our map, and they re- 
sult in 3,064 organizations that include more than one 
AS with a maximum size of 81 and a median size of 2. 
We use the results of Augustin et al. [3] to identify IXPs 
and their locations between pairs of ASes. These results 
show 359 IXPs and 43,337 AS-pairs between which at 
least one IXP exists. We then use the results of John- 
son et al. [22] to group IXPs into organizations. These 
produce 19 IXP organizations with more than one IXP, 
for which the maximum size is 26 and the median size 
is 2. 

We add relays to the routing map using Tor consen- 
suses and descriptors from Tor Metrics [30] . We use the 
Tor consensus of 12/1/13 at 00:00. The network at this 
time included 1,235 relays that were guards only, 670 re- 
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lays that were exits only, and 493 relays that were both 
guards and exits. The consensus groups relays into 152 
families of size greater than one, of which the maximum 
size was 25 and the median size was 2. Family uptime 
is computed as the number of assignments of the Run- 
ning flag to family members, averaged over the family 
members and the consensuses of 12/2013. We map the 
Tor guards and exits to ASes using Routeviews prefix 
tables from 12/1/13, 12/2/13, and 11/30/13, applied in 
that order, which is sufficient to obtain an AS number 
for all guards and exits. Note that we observe one exit 
relay that mapped to an AS that does not appear in our 
map, and so we add that additional AS. There are 699 
unique ASes among the guards and exits. 

We create paths from each AS in our map to each 
guard and exit AS. The median number of paths that 
we can infer to a guard or exit AS is 46,052 (out of 
the 46,369 possible). The maximum AS path length is 
12, and the median AS path length is 4. The maximum 
number of IXPs on a path is 18, and the median number 
is 0. 

The resulting BBN for The Man thus includes 
2398 relay variables (one for each guard and exit) and 
32,411,931 virtual links (one from each AS to each guard 
or exit AS). For any path missing from our routing map, 
we simply take the path to include only the source AS 
and destination AS. 

6.2 Analysis 

We consider security from 58 of the 60 most common 
client ASes as measured by Juen [24] (AS8404 and 
AS20542 do not appear in our map). Juen reports that 
these 58 ASes covered 0.951 of client packets observed. 

For each of our 58 client locations, we choose an exit 
and guard using Tor’s path-selection algorithm as imple- 
mented in TorPS [22]. Note that (among other consid- 
erations) this does ensure that the guard and exit don’t 
share the same family or /16 subnet. Then we sample 
The Man BBN to determine if the resulting circuit to 
the server is vulnerable to a first-last correlation attack. 

Over 100,000 trials, the minimum, mean, median, 
and maximum probabilities of compromise were 0.108, 
0.132, 0.127, and 0.164, respectively. 


7 MLATs and Their Effects 

Our ontology described above includes Mutual Legal 
Assistance Treaties (MLATs), which have not received 
significant previous attention in the study of attacks on 
Tor. If a user of our trust system is worried about state- 
level adversaries, then MLATs are potentially signifi- 
cant. To illustrate this, we do some preliminary analysis 
of the effects of MLATs on the ability of (randomly con- 
structed, composite) state-level adversaries to carry out 
first-last correlation attacks. We start with an overview 
of MLATs and the data we use about them. 

7.1 MLATs 

MLATs formally require and enable their signatories to 
cooperate in many aspects of criminal legal assistance, 
from investigations, to collection of evidence, to extradi- 
tion of targets or suspects. Most relevantly to the con- 
sideration of adversary power, this enables countries, 
through MLATs, to gain information from network com- 
ponents in other legal and governmental jurisdictions. 

MLATs have existed since ancient times. In the last 
half century, the number of MLAT relationships be- 
tween countries has grown sharply, and the number of 
countries that participate in MLATs at all has also in- 
creased [10]. 

Currently, thousands of MLATs are in some stage of 
negotiation or entry into force between different coun- 
tries. In order to analyze the effects of MLATs on first- 
last compromise in Tor, we make use of the MLAT 
database behind the www.MLAT.is [9] site described by 
Cortes [10]. This draws on treaty data from a variety of 
original sources, such as national governments and in- 
ternational organizations. It analyzes when and whether 
the treaties are actually in force, the countries subject to 
them, and what type of applications (e.g., extradition) 
the treaty has. 

MLATs vary in their strength, such as what kinds 
of exceptions they include, the strength of evidence col- 
lection they cover, and the extent to which one partner 
can coerce another to share information. While an in- 
creasing body of international case law exists with re- 
spect to MLATs, some MLATs remain untested in terms 
of how easily they can be used to get from a treaty 
partner information that is potentially covered by the 
MLAT. A country may be able to influence one of its 
MLAT partners — i.e., a country with whom it has a di- 
rect treaty — in invoke one of the partner’s treaties with 
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yet other countries in order for the original country to 
obtain more information. Such scenarios have less well- 
defined parameters and offer the original country less 
direct power. In our analysis in Sec. 7.4, we thus focus 
only on direct relationships instead of the transitive ap- 
plication of MLATs. We also restrict our attention to the 
MLATs that are most likely to be effective in the setting 
we consider here, namely the non-extradition, bilateral, 
criminal MLATs that are already in force. However, our 
framework can easily be adapted to consider, e.g., tran- 
sitive applications of MLATs or multilateral MLATs. 

7.2 Cable data 

In order to study the effects of MLATs on the power of 
adversaries, and to account for submarine cables as a 
network- component type of new interest, we randomly 
construct composite state-level adversaries (“pseudo- 
countries”) that comprise countries appearing in cable 
and MLAT data. As described in Section 7.3, this in- 
corporates weighting by cable bandwidth and account- 
ing. As noted above, we study the ways that in-force, 
bilateral, non-extradition, criminal MLATs expand the 
capabilities of the pseudocountries that we construct. 
The version of the MLAT. is data that we use provided 
559 applicable MLATs. 

We use Greg’s Cable Map (cablemap . inf o) [25] as 
our source for cable data. This includes bandwidth in- 
formation for many of the cables. Other sources, such 
as TeleGeography’s data [29], could be used instead or 
as well once bandwidth information was added. 

After some cleanup of the data as described in 
App. B, we were left with 222 cables with a total band- 
width of over 722,000 Gb/s. Different data sources may 
differ in the exact set of cable systems that they cover, 
and bandwidth data may be reported differently by dif- 
ferent sources or be unavailable. However, we believe the 
data we use are plausible, and they certainly demon- 
strate the feasibility of the approach we present here. 

To determine a country’s cable bandwidth, we 
counted the total bandwidth of all cables with landing 
points coded in that country. We use “MLAT reach” 
to mean the total bandwidth of all cables landing in a 
country or its MLAT partners. Figure 2 shows the coun- 
tries with at least 550,000 Gb/s in MLAT reach. The top 
bar shows the total bandwidth of all cables in the data 
set; the country-specific bars show each country’s band- 
width (light part of the bars) and MLAT reach (light 
and dark parts of the bars together). The ranking of 



Fig. 2. The amount of cable bandwidth (Gb/s) controlled by 
countries directly (light part of bars) and in collaboration with 
their first-degree MLAT partners (entire bars) for the 11 countries 
that control at least 550 Tb/s in collaboration with their MLAT 
partners. The top bar shows the total bandwidth of all cables in 
the data set. 

countries by MLAT reach is notably different than their 
ranking by bandwidth (see App. B). 

7.3 Pseudocountry construction 

We use the submarine-cable data to construct random, 
hyptothetical adversaries (“pseudocountries”) compris- 
ing countries that appear in the cable and MLAT data 
sets. While we believe the data we have obtained are 
generally plausible, using hypothetical adversaries al- 
lows us to analyze adversary capabilities without being 
distracted by issues surrounding the precise capabilities 
of real countries. 

In order to construct our pseudocountries, we use a 
randomized procedure that iteratively picks new coun- 
tries, weighted by country bandwidth, and adds them 
to the pseudocountry if they satisfy constraints that pa- 
rameterize the procedure. Figure 3 in App. G shows 
pseudocode for this procedure. 

As a preliminary step to randomly constructing the 
pseudocountry adversaries that we use here, we consid- 
ered various combinations of constraints. For each can- 
didate constraint combination, we ran numerous ran- 
dom trials and evaluated the resulting pseudocountries 
in terms of the amount of the bandwidth and number 
of cables that they control directly and might be able to 
access through their first-degree MLAT partners. This 
suggested the constraint combinations that we use here 
to get pseudocountries that control large, medium, and 
small amounts of cable bandwidth, both directly and 
with their MLAT partners. We treat a country C as 
an MLAT partner of a pseudocountry P if any one of 
the constituent countries in P is listed in the MLAT 
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data as a partner of C. The constraints also vary in the 
number of MLAT partners that each resulting pseudo- 
country has. Table 4 in App. C provides statistical in- 
formation about the pseudocountries generated by these 
constraints over 10,000 trials. After finalizing the con- 
straint sets, we then ran the randomized pseudocountry- 
construction procedure once with each of these chosen 
combinations and took the resulting output as the ad- 
versaries that we use here. 

The first constraint set we use is: 

- Allow at most one country that directly sees at least 

100.000 Gb/s (i.e., USA, COL, BRA, JPN, PRI, 
GBR, PAN, and CHN). 

- Allow at most two countries that directly see at least 

60.000 Gb/s but less than 100,000 Gb/s (i.e., ZAP, 
ECU, VGB, ABW, GYP, RUS, IND, SGP, ESP, and 
AGO). 

- Allow at most five countries in total 

The second constraint set we use is: 

- Allow the total capacity seen directly by the pseudo- 
country to be at most 50,000 Gb/s. 

- Allow the total capacity seen by the pseudocountry 
and its first-degree MLAT partners to be at most 

120.000 Gb/s. 

The third constraint set we use is: 

- Allow no country that, together with its first-degree 
MLAT partners, sees at least 500,000 Gb/s (i.e., 
USA, AUS, CAN, HKG, IND, GBR, BEL, CHE, 
ESP, DEU, MEX, ITA, UKR, ARG, SVK, BRA, 
ROU, SVN, ZAE, POL, HUN, GRC, and ERA). 

- Allow the total capacity seen directly by the pseudo- 
country to be at most 400,000 Gb/s. 

- If a new country (after the first one) is not an MLAT 
partner of one of the countries already in the pseu- 
docountry, then either the new country or all of the 
existing countries must have no MLAT partners at 
all. 

- Allow at most four countries in total. 

After deciding on these constraint sets using the 
statistics about the countries they randomly generated, 
we then randomly generated one pseudocountry (Pi, P 2 , 
and P3, respectively) from each constraint set for use as 
an adversary. Those pseudocountries and the pseudo- 
countries together with their MLAT partners (denoted 
Ml, M2, and M3) are: 

- Pi: AUS, ESP, GTM, JPN, and SGP 

- Mi: ARE, ARG, AUS, AUT, BEL, BOL, BRA, CAN, 
CHE, CHL, CHN, COL, CPV, CZE, DOM, DZA, 
ECU, ESP, PIN, ERA, GBR, GRC, GTM, HKG, 
HUN, IDN, IND, ISR, ITA, JPN, KAZ, KOR, LUX, 
MAR, MCO, MEX, MRT, MYS, NLD, PAN, PER, 



Pi 

Ml 

P 2 

M 2 

P 3 

M 3 

# Countries 

5 

53 

14 

22 

4 

9 

Capacity (10^ Gb/s) 

287 

678 

50 

112 

112 

250 

Capacity (%) 

39.8 

93.8 

6.9 

15.4 

15.4 

34.6 

Cables (out of 222) 

56 

189 

32 

53 

19 

68 


Table 1. Pseudocountry characteristics. 


PHL, PRT, PRY, SGP, SLV, SVN, SWE, THA, 
TUN, URY, USA, and ZAP 

- P2: ALB, ASM, DJI, GUP, GUM, HND, MNP, MSR, 
MUS, PNG, SLE, SYR, TZA, and WSM 

- M2: ALB, ASM, BEL, CZE, DJI, GRC, GUE, GUM, 
HND, HUN, IND, MNE, MSR, MUS, PNG, POL, 
ROU, SLE, SVK, SYR, TZA, and WSM 

- P 3 : ECU, PJI, GUM, and QAT 

- Ms: ARG, AUS, BOL, CHE, ECU, PJI, GBR, GUM, 
and QAT 

Table 1 shows statistics about the PiS and M^s in- 
cluding the number of constituent (real) countries, the 
cable capacity that they control (both absolute band- 
width and the fraction of the total cable bandwidth), 
and the number of cables they control out of the 222 in 
our data set. 

7.4 Analysis 

We study the effects of MLATs on compromise by con- 
sidering a variety of ways in which an adversary can co- 
operate with its MLAT partners. The differences reflect 
both various possible levels of coordination between a 
country and its MLAT partners as well as the potential 
difficulties with actually enforcing MLATs. The compro- 
mise models we consider are the following: 

Type P : The adversary compromises a path exactly 
when the pseudocountry appears on both virtual links 
in a path (i.e., between the source and the guard and 
between the exit and the destination). 

Type P-)-M: The adversary compromises a path ex- 
actly when the pseudocountry P appears on both vir- 
tual links or there is an MLAT partner M of P that 
appears on both virtual links. This models M sharing 
the results of its unilateral attacks with P. 

Type P-)-PM: The adversary compromises a path ex- 
actly when the pseudocountry P appears on both vir- 
tual links or there is an MLAT partner M of P such 
that P appears on one virtual link and M appears on 
the other. This models M sharing information with P 
that produces a coordinated attack but not sharing the 
results of its unilateral attacks. 
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Type P+M+PM: The adversary compromises a path 
exactly when the pseudocountry P appears on both vir- 
tual links or there is an MLAT partner M of P such that 
either M appears on both virtual links or P appears on 
one virtual link and M appears on the other. This mod- 
els M both sharing partial information to produce a 
joint attack and information about M’s unilateral at- 
tacks. 

For all types of coordination, we can also consider 
the effects using each MLAT partner with a specified 
probability p. Individual MLATs may turn out to be 
difficult to use for a particular application or take too 
long to apply. We model this by fixing a probability p = 
0.5 and, for each MLAT partner country C, including C 
in the adversary with probability p. (In particular, if C 
is included in the adversary, it is included for all paths.) 
This probability could easily be varied, or a different 
probability of enforcement could be assigned to each 
MLAT. For example, this would allow users who have 
beliefs about the degree to which different MLATs are 
effective (e.g., accounting for the factors discussed at 
the end of Sec. 7.1) to incorporate those beliefs into 
this framework. 

We use data obtained by Juen et al. [23] containing 
traceroutes from a selection of Tor relays to destinations 
randomly chosen from approximately 500,000 address 
blocks. We did not discard incomplete traceroutes, but 
we did omit traceroutes whose source relays were listed 
in 192.168.*.* and 10.*.*.*. This left us with paths from 
57 relays to destinations across the Internet. We used 
the GeoIP database [26] to geolocate the addresses in 
the traceroute data. 

We consider whether a pseudocountry adversary, 
under different coordination models described above, 
could observe both virtual links (outside of the Tor net- 
work) of a path constructed by combining two paths 
from the traceroute data set that use different relays. 
This treats all of the Tor relays as both guards and exits, 
an assumption made necessary by the limited number of 
relays, and assumes symmetric routing. Our focus here 
is on the extent to which MLATs increase the reach of 
adversaries, so we consider absolute numbers of paths 
rather than weighting by relay bandwidths. Because of 
the volume of the data, we do not attempt to weed out 
paths that start and end at the same IP but use differ- 
ent relays. However, the possible effect of these paths on 
our results is at most one part in 10®. 

Table 2 shows, for the coordination models of inter- 
est noted above, the fraction of paths observed under 
each model by each of the three pseudocountries con- 
structed in Section 7.3. It also shows the lO^**, 50*®, and 


Pseudocountry^ 

1 

2 

3 

Type P 

0.019 

0.000 

0.000 

Type P-l-M 

0.66 

0.005 

0.098 

Type P-I-PM 

0.248 

0.001 

0.006 

Type P-I-M-I-PM 

0.737 

0.006 

0.103 

Type P-l-M (p = 0.5; 10%) 

0.042 

0.000 

0.000 

Type P-l-M (p = 0.5; 50%) 

0.596 

0.003 

0.002 

Type P-l-M (p = 0.5; 90%) 

0.650 

0.005 

0.098 

Type P-hPM (p = 0.5; 10%) 

0.104 

0.000 

0.000 

Type P-I-PM (p = 0.5; 50%) 

0.192 

0.001 

0.001 

Type P-hPM (p = 0.5; 90%) 

0.239 

0.001 

0.006 

Type P-I-M-I-PM (p = 0.5; 10%,) 

0.127 

0.0 

0.001 

Type P-|-M-|-PM {p = 0.5; 50%) 

0.317 

0.003 

0.1 

Type P-I-M-I-PM (p = 0.5; 90%,) 

0.72 

0.006 

0.102 


Table 2. The fraction of paths in the universe considered that 
are compromised by each of the three pseudocountry adversaries 
with different types of coordination. Percentiles for probabilistic 
enforcement of MLATs are computed over 10,000 trials. 

90^^ percentiles of compromise fractions (over 10,000 tri- 
als) when each MLAT partner cooperates with probabil- 
ity 0.5. In all cases, MLATs can allow for much greater 
reach. 

8 Using Trust to Improve Path 
Selection 

As a proof of concept, we examine how trust might be 
used to improve security in Tor. In particular, we con- 
sider how trust might be used to prevent the first-last 
correlation attacks by The Man (introduced in Sec. 6) 
when accessing a given online chat service. We suppose 
that users use trust to choose paths that are less likely 
to be vulnerable to this attack and run experiments 
to evaluate how effective this might be. These exper- 
iments just show the potential for improvement from 
using trust; they do not take into account other attacks 
or how to maintain good performance. 

8.1 Experiments 

Against The Man, we examine both how users can 
choose more-secure paths through Tor and how the ser- 
vice can choose server locations to make them more se- 
curely accessible via Tor. 

For our experiments, we use as the destination ser- 
vice the Web chat server webirc.oftc.net. This IRC 
service is run by the Open and Free Technology Commu- 
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nity and is popular with Tor developers. As in Sec. 6.2, 
we consider users coming from 58 of the top 60 client 
ASes measured by Juen. In addition, for all of our ex- 
periments, the compromise probability (i.e., the prob- 
ability of a first-last correlation attack by The Man) 
is estimated by sampling from The Man BBN 100,000 
times and using the fraction of compromised samples as 
the probability. 

The algorithms we use in our experiments are as 
follows: 

- Clients use trust: Guards are chosen for each client 
location to be the three relays with the smallest prob- 
abilities that the adversary compromises the guard or 
an AS or IXP on the path to the guard. Then for a 
given destination, the algorithm considers using each 
of the client location’s three guards with each Tor 
exit relay, estimates the probability of first-last com- 
promise, and chooses the guard and exit with lowest 
resulting probability. 

- Service uses trust: We only consider each AS 
containing an exit relay as a possible location for 
the server because these locations have the minimal 
chance for the adversary to observe traffic between 
the exit and destination. For each potential server lo- 
cation, we compute the probability of first-last com- 
promise for each client location. This is estimated 
for a given client location by considering each of its 
guards, considering each exit sharing the server lo- 
cation, estimating the compromise probability, and 
using the minimum of these probabilities. We choose 
the server location with the minimum average com- 
promise over all client locations. We add each addi- 
tional server greedily by repeating the same process 
except that we only update the compromise proba- 
bility for a client location if it decreases when using 
the new potential server location. 

8.2 Analysis 

Our results are shown in Table 3. For ease of compari- 
son, the first row shows the results presented in Sec. 6 
for a client using Tor’s default path-selection algorithm. 
We can see that by using trust to choose guard and exit 
relays, clients can reduce the compromise probability 
by a factor of over 2.8 on average. When in addition the 
service changes the location of its server, that probabil- 
ity drops again by a factor of over 2.7 and approaches 
the minimum possible of (0.1)^ = 0.01. It appears that 
adding additional server locations does not add signif- 
icantly to user security. Note that each probability is 



Mean 

Median 

Min 

Max 

Default Tor 

0.132 

0.127 

0.108 

0.164 

Only client uses trust 

0.046 

0.049 

0.026 

0.091 

Client-Fservice, 1 server 

0.017 

0.018 

0.009 

0.033 

Client-Fservice, 2 servers 

0.017 

0.017 

0.009 

0.034 

Client-Fservice, 3 servers 

0.017 

0.017 

0.009 

0.033 


Table 3. Fi rst— last correlation probabilities against The Man for 
58 client locations 


estimated with 100,000 samples, which can explain why 
some probabilities are slightly below 0.01 and why the 
probabilities sometimes increase slightly when a server 
is added. 

8.3 Discussion 

Note that our evaluation of these algorithms serves as 
a proof of concept for how our trust framework might 
be used. We do not propose that the trust-based algo- 
rithms we evaluate should be used in Tor exactly as de- 
scribed. Choosing paths in Tor based on the underlying 
network topology potentially creates security vulnera- 
bilities outside of first-last correlation. For example, as 
has been observed in other work using trust and net- 
work location [1, 20, 21], the adversary could place his 
own Tor relays in locations that a user is more likely 
to select, and identities of the relays on a path chosen 
based on a client’s trust and location may themselves 
reveal information about the client. In addition, path 
selection must take load balancing into account in or- 
der for Tor to maintain adequate performance. We leave 
as an open problem designing path-selection algorithms 
that use trust and location information in a way that 
convincingly improves security while maintaining per- 
formance. 

We also note that the server-location algorithm, 
while it may appear overly optimistic, does seem plausi- 
ble in many important use cases. Services with few Tor 
customers or without enough resources to run multiple 
servers for a diverse client base do seem unlikely to be 
able to move their servers to improve security as much 
as we could in our experiments. However, we can imag- 
ine that users might run servers for personal use and 
choose to locate them in a way that keeps them secure 
against their adversaries. It also seems plausible that 
large organizations with privacy-conscious constituen- 
cies, such as banks or governments, could run multiple 
redundant servers in order for their clients to choose a 
location that can be accessed securely. 
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9 Conclusions and Future Work 

In this paper we have outlined a general but practical 
approach to represent network trust for the purposes 
of anonymous communication. Our approach represents 
network trust as an arbitrary probability distribution 
over the possible sets of network elements observed by 
the adversary, and we describe how to conveniently and 
efficiently represent and express such distributions. Our 
model allows trust in general network elements, not just 
relays, and our adversary distributions are more general 
than previously considered. This model can be used to 
inform the user’s path selection in Tor, helping her to 
avoid first-last correlation attacks. 

We have introduced two novel adversaries that can 
be expressed and analyzed using our trust framework. 
First, we presented The Man, which represents a per- 
vasive, powerful adversary that we have argued could 
be used as the default trust belief in Tor. Second, we 
discussed the risk of Mutual Legal Assistance Treaties 
(MLATs), both incorporating these into our system and 
analyzing their effects on adversary capabilities under 
different models of coordination between the adversary 
and its MLAT partners. 

We have also carried out preliminary experiments 
that show that the potential for our notion of trust to 
reduce the probability of first-last correlation against 
a client who uses trust to inform path selection. This 
probability of attack is further reduced when the service 
accessed by the user has positioned its server(s) in a way 
that is also informed by our work. 

Ongoing and future work includes the further de- 
velopment and investigation of Tor path-selection algo- 
rithms that use trust as formalized here, the further de- 
velopment and analysis of methods to express trust that 
are natural and usable, the continued analysis of possi- 
ble trust errors and their effects, and the development of 
a user interface for importing or entering trust beliefs. 
Two particularly important tasks are the development 
of collections of trust beliefs that capture important use 
cases and the study of how users can use different trust 
beliefs without being identified by that behavior. 

Another future direction for research is making the 
MLAT model more complex, accounting for multilat- 
eral MLATs and variations in enforcement probabil- 
ity between different MLATs. Our focus in consider- 
ing MLATs was their effect on the adversary’s reach 
as a fraction of paths, but this question would also be 
interesting in a non-uniform setting corresponding to 
Tor’s usage patterns. Another topic of ongoing and fu- 


ture work is the modeling of the adversary’s control of 
individual submarine cables. 
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A A language for predicates 

We expect that the user may want to express some of 
her beliefs (trust and perhaps also structural) in terms 
of predicates, even though she might not be able to ef- 


fectively evaluate these herself. For example, the user’s 
trust in ASes with very few routers might be different 
than her trust in ASes with many routers (perhaps be- 
cause she believes that larger ASes are more likely to 
have processes, policies, and organizational experience 
that prevent misconfiguration). She might capture this 
with a predicate that expresses whether the number of 
routers in an AS (in the edited world) is at least as great 
as a specified threshold. 

The belief languages must thus incorporate a lan- 
guage for predicates that the system can interpret. We 
treat the predicate language as a separate component, 
and we sketch here one predicate language that will be 
used by all of our example belief languages. This lan- 
guage includes: 

Connectives and operators Basic logical connec- 
tives (including negation) 

Typing Testing whether an instance or attribute is or 
is not of a specified type; users may test for types not 
in the ontology (e.g., to check types that they have 
added) 

Sets Sets (explicitly enumerated or defined by some 
predicate) and set membership/non- membership 
Membership A predicate may depend on a set and 
test whether a value is in that set. 

Tests of attribute values Tests must be appropriate 
to the data type used in the attribute; equality and 
inequality tests are allowed unless specified other- 
wise. Predicates may test user-defined attributes.^ 
This may reference user-defined attributes. 

Tests of the world structure (trust beliefs only) 
After the world is constructed and edited (i.e., when 
applying trust beliefs but not when applying struc- 
tural beliefs), we allow predicates in beliefs to refer 
to the structure of the world. 


B Cable Data 

As noted above, we did some minor cleanup of the ca- 
ble data. As this work is intended to demonstrate the 
feasibility of our approach, we do not attempt to con- 
struct a definitive description of international subma- 
rine cables. The changes that we make to the original 


7 We expect that user-defined attributes will only be tested 
by the user, e.g., through predicates that she specifies on those 
attributes. As noted in the construction sequence in Section 2.1, 
the system will not change the structure of the world based on 
user-defined attributes. 
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data demonstrate the flexibility of this system as more 
and newer information becomes available to a user. We 
note that different sources sometimes provide different 
descriptions of cables, especially with respect to band- 
width. As discussed below, we have attempted to pro- 
vide capacity data for cables lacking it in the original 
data set. 

We coded the landing points in the data using the 
ISO-3166 country list [18]; we followed the MLAT data 
set when questions arose of which should be considered 
independent. We omitted cables that the data appear to 
indicate are not live, go over land, or are no longer used 
for general Internet traffic. The coding of cable land- 
ings required a little cleanup to match the appropriate 
countries from the MLAT. is database. After this initial 
cleanup, we checked all cables that were listed as landing 
in only one country or that lacked a listed capacity. 

For cables without capacities listed, the then- 
current Submarine Cable Almanac [27] provided values 
that we use as follows: BBG (Bay of Bengal Gateway), 
30 Tb/s; TGN-Gulf, 1.28 Tb/s; ADRIA-1, 622 Mb/s; 
Suriname-Guyana (SG-SGS), 1.28 Tb/s; GeltixConnect, 
960 Gb/s. For the America Movil-1 cable, we use a press 
release [2] as the source for a 50 Tb/s capacity. For the 
Emerald Bridge cable, the cable website notes that it 
has 96 fibers [13]; we take 9.6 Tb/s as a guess for its 
total capacity. For the Vanuatu-Fiji Interchange Cable 
Network, we use the 1.28 Tb/s figure from the cable 
website [16]. For cables Melita-1 and WARF, no capac- 
ity data appears to be available. We use non-zero guesses 
of 100 Gb/s for each of these cables (whose in-service 
dates from the original data set are 2009 and 2007, re- 
spectively). 

For cables with fewer than two countries listed in the 
set of landing countries that we generate, we examine 
the cables in more detail. We generally omit those that 
are listed as landing in only one country. Both FLAG 
Atlantic (FA-1) and FLAG ATLANTIG NORTH are 
listed as landing in a single country (USA and GBR, 
respectively); based on the Submarine Gable Almanac, 
we treat these as a single cable with landings in USA, 
GBR, and FRA. Based on the cable’s website [17], we 
take the landings of the Vanuatu-Fiji Interchange Gable 
Network to be VUT and FJI. 

Figure 4 shows the amount of total bandwidth of 
cables landing in countries that have at least 60,000 
Gb/s of such bandwidth. We note that adding band- 
width from cables landing in first-degree MLAT part- 
ners significantly changes the bandwidth rankings. For 
example, GOL, BRA, JPN, and PRI rank in positions 
two through five when considering only country band- 


width, but they do not rank in the top 15 in MLAT 
reach (compare with Fig. 2 in Section 7.2). 

C MLAT Adversary Construction 

Before generating the pseudocountries used in our anal- 
ysis, we repeatedly tested possible constraint sets us- 
ing the pseudocountry-generation algorithm described 
in Fig. 3. Table 4 shows statistics for each of the three 
constraint sets when used in 10,000 random trials. For 
each constraint. Tab. 4 shows the lO***, 25**^, 50**’ (me- 
dian), 75**’, and 90**’ percentiles, as well as the mean, 
over these random trials for the number of countries in 
the pseudocountry, the cable capacity seen directly by 
the pseudocountry, the number of countries in the pseu- 
docountry together with its MLAT partners, and the 
MLAT reach of the pseudocountry. We used this type 
of statistical output to tweak our constraint sets. Once 
those sets were finalized, we then randomly generated 
one pseudocountry for each of the three constraint sets 
to obtain the adversaries described in Sec. 7. 
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repeat 

Select cable at random (bandwidth weighted) 

if No landing country on the cable is part of the pseudocountry then 
repeat 

Pick a landing country from the cable at random (bandwidth weighted) 
if Allowed by country constraints then 
Add that country to the pseudocountry 

end if 

until A country is added to the pseudocountry or no more countries left to try on this cable 

end if 

until The maximum number of countries (15 if not otherwise specified in the constraints) has been added or 
there are no more cables to try. Restart the process if there are fewer than two countries. 

Fig. 3. Pseudocode for constructing pseudocountries, parameterized by country constraints 



Fig. 4. The amount of cable bandwidth (Gb/s) controlled by 
countries directly for the 18 countries that control at least 60 
Tb/s directly. 




Set 1 

Set 2 

Set 3 

Countries 

10% 

5.0 

8.0 

4.0 

25% 

5.0 

9.0 

4.0 

50% 

5.0 

11.0 

4.0 

Mean 

5.0 

11.1 

4.0 

75% 

5.0 

13.0 

4.0 

90% 

5.0 

15.0 

4.0 

Capacities 

10% 

247,498 

49,843 

164,429 

25% 

311,760 

49,924 

205,263 

50% 

357,764 

49,984 

251,436 

Mean 

370,197 

49,813 

252,436 

75% 

440,099 

49,999 

301,889 

90% 

478,859 

50,000 

340,469 

Partners 

10% 

23.0 

14.0 

6.0 

25% 

37.0 

17.0 

7.0 

50% 

56.0 

20.0 

9.0 

Mean 

54.3 

19.9 

12.0 

75% 

69.0 

24.0 

18.0 

90% 

83.0 

25.1 

23.0 

ML AT Reach 

10% 

605,283 

56,419 

316,215 

25% 

640,807 

66,918 

414,541 

50% 

685,620 

102,179 

476,768 

Mean 

670,296 

92,435 

451,961 

75% 

708,884 

115,994 

513,251 

90% 

714,749 

118,352 

538,208 


Table 4. Statistics from 10,000 trials for pseudocountries gener- 
ated by the constraint sets considered here. 


